Job Description:
? Experience working on large data sets
on a distributed computing environment such as Hadoop or Spark
? Understanding of spark framework and
tuning of spark applications Extensive experience with horizontally scalable
and highly available system design and implementation, with a focus on performance
and resiliency
? Extensive ingesting, cleaning,
transformation and aggregation of massive amounts of data from multiple
internal and external sources by using Azure Databricks
? Create a data pipeline that enables
various systems within the ecosystem to stream high volume data from multiple
sources into a central repository for processing
? Experience building data pipeline by
using Azure Big data stack
? Setup and Configuration of Azure
Databricks
? Working knowledge of message queueing,
stream processing by using Azure Databricks Notebooks / Jar
? Reporting at scale enabled by
Databricks Delta for incremental processing
? End-to-End solution containing Spark (SQL, Structured Streaming), SQL Datawarehouse, Azure Data Lake, Azure
CosmosDB and Power BI (for visualization)
? Exposure to Data & Analytics,
Cloud technologies
? Setup different cloud environments for
Dev, QA, and Prod
? Should have recent Azure experience